Fast Asynchronous Anti-TrustRank for Web Spam Detection

نویسندگان

  • Joyce Jiyoung Whang
  • Yeon Seong Jeong
  • Inderjit S. Dhillon
  • Seonggoo Kang
  • Jungmin Lee
چکیده

Web spam detection is an important problem in Web search. Since Web spam pages tend to have a lot of spurious links, many Web spam detection algorithms exploit the hyperlink structure between the Web pages to detect the spam pages. Anti-TrustRank algorithm is a well-known link-based spam detection algorithm which follows the principle that spam pages are likely to be referenced by other spam pages. Since a real-world Web graph involves tens of billions of nodes, it is crucial to develop work-efficient Web spam detection algorithms. In this paper, we develop asynchronous Anti-TrustRank algorithms which allow us to significantly reduce the number of arithmetic operations compared to the traditional synchronous Anti-TrustRank algorithm without degrading the performance in detecting Web spams. We theoretically prove the convergence of the asynchronous Anti-TrustRank algorithms, and conduct experiments on a real-world Web graph indexed by NAVER which is the most popular search engine in Korea. ACM Reference Format: Joyce Jiyoung Whang, Yeon Seong Jeong, Inderjit S. Dhillon, Seonggoo Kang, and Jungmin Lee. 2018. Fast Asynchronous Anti-TrustRank for Web Spam Detection. In Proceedings of WSDM workshop on Misinformation and Misbehavior Mining on the Web (MIS2). ACM, New York, NY, USA, 4 pages. https://doi.org/10.475/123_4

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Link-Based Characterization and Detection of Web Spam

We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. ...

متن کامل

Link-Based Spam Algorithms in Adversarial Information Retrieval

Web spam has become one of the most exciting challenges and threats to Web search engines. The relationship between the search systems and those who try to manipulate them came up with the field of adversarial information retrieval. In this paper, we have set up several experiments to compare HostRank and TrustRank to show how effective it is for TrustRank to combat Web spam and we have also re...

متن کامل

SIGIR 2006 Workshop on Adversarial Information Retrieval on the Web AIRWeb 2006

We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. ...

متن کامل

Anti-Trust Rank for Detection of Web Spam and Seed Set Expansion

In the recent times, the Web has been the most popular and perhaps the most efficient platform for sharing, storing as well as retrieving information. Finding the required information from the Web is facilitated by search engines. Search engines form the interface between the Web and the users. Given the vast amount of information available on the Web, search engines must pick a small subset of...

متن کامل

Web Spam Detection with Anti-Trust Rank

Spam pages on the web use various techniques to artificially achieve high rankings in search engine results. Human experts can do a good job of identifying spam pages and pages whose information is of dubious quality, but it is practically infeasible to use human effort for a large number of pages. Similar to the approach in [1], we propose a method of selecting a seed set of pages to be evalua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018